20/09/2018

Welcome to Data Handling: I.C.V. 2018!

  • Fire up your notebooks!
  • Go to this page: http://bit.ly/datahandling2018
  • Use one row to respond to the questions in the column headers (see the first two rows for examples).

Introductory Example

Data input, processing, output

The Data Science Pipeline/Workflow

Data Science workflow. Source: @wickham_grolemund2017

Data Science workflow. Source: Wickham and Grolemund (2017)

The Data Science Pipeline/Workflow

Data Science workflow. Source: @wickham_grolemund2017

Data Science workflow. Source: Wickham and Grolemund (2017)

What could be the output of all this?

The Data Science Pipeline/Workflow

  • Research report/paper (e.g., BA Thesis)
  • Presentation/Slides
  • Website
  • Web application (interactive; alas the introductory example)
  • Dashboard for management
  • Recommender system (i.e., a trained machine learning algorithm)

Background

Technological change

Technological change

Source: Source: statista.com.

Source: Source: statista.com.

Top: Number of mentions of the terms 'Big Data' or 'Artificial Intelligence' in academic and media sources, 1980-2016. Bottom: Number of mentions in The New York Times and The Wall Street Journal, used as proxies for U.S. mainstream media and business media. Note logarithmic y-axis scale. Source: @katz_2017.

Top: Number of mentions of the terms 'Big Data' or 'Artificial Intelligence' in academic and media sources, 1980-2016. Bottom: Number of mentions in The New York Times and The Wall Street Journal, used as proxies for U.S. mainstream media and business media. Note logarithmic y-axis scale. Source: Katz (2017).

Course Structure

Course Concept

  • Lectures (every Thursday morning)
    • Background/Concepts
    • Live demonstrations of concepts
    • Illustration of 'hands-on' approaches

Course Concept

  • Lectures (every Thursday morning)
    • Background/Concepts
    • Live demonstrations of concepts
    • Illustration of 'hands-on' approaches
  • Workshops/Exercises (bi-weekly evening sessions)
    • Guided tutorials
    • Discussion of homework exercises
    • Recap of theoretical concepts

Course Concept

  • Lectures (every Thursday morning)
    • Background/Concepts
    • Live demonstrations of concepts
    • Illustration of 'hands-on' approaches
  • Workshops/Exercises (bi-weekly evening sessions)
    • Guided tutorials
    • Discussion of homework exercises
    • Recap of theoretical concepts
  • Guest Lectures

15/11/2018: Guest Lecture by Dr. Michael Zehnder

Michael Zehnder, PhD, Trium EMBA
Co-Founder & CEO Swiss Data Labs AG

Guest Lecture: Dr. Christian Ulbrich?

Part I: Data (Science) Fundamentals

Date Topic
20.09.2018 Introduction: Big Data/Data Science, course overview
27.09.2018 An introduction to data and data processing
27.09.2018 Exercises/Workshop 1: Tools, working with text files
04.10.2018 Data storage and data structures
11.10.2018 'Big Data‘ from the Web
11.10.2018 Exercises/Workshop 2: Computer code and data storage

Part II: Data Gathering and Preparation

Date Topic
18.10.2018 Programming with data
25.10.2018 Data sources, data gathering, data import
25.10.2018 Exercises/Workshop 3: Programming with Data
15.11.2018 Guest Lecture: Dr. Michael Zehnder (Swiss Data Labs, gateB)
22.11.2018 Data preparation and manipulation
22.11.2018 Exercises/Workshop 4: Data import and data preparation/manipulation
29.11.2018 Case Study: The Programmable Web, Big Public Data, and Political economics

Part III: Analysis, Visualisation, Output

Date Topic
06.12.2018 Basic statistics with R
06.12.2018 Exercises/Workshop 5: Applied data analysis with R
13.12.2018 Visualization, dynamic documents
20.12.2018 Exercises/Workshop 6: Visualization, dynamic documents
20.12.2018 Wrap-Up, Q&A

Main textbooks

Exam Information

  • Central, written examination.
  • Multiple choice questions.
  • Theoretical concepts and practical applications in R (questions based on code examples).

Q&A

References